DBCHEM: A Database Query Based Solution for the Chemical Compound and Drug Name Recognition Task

نویسندگان

  • Caglar Ata
  • Tolga Can
چکیده

We propose a method, named DBCHEM, based on database queries for the chemical compound and drug name recognition task of the BioCreative IV challenge. We prepared a database with 145 million entries containing compound and drug names, their synonyms, and molecular formulas. PubChem Power User Gateway (PUG) system is used to construct the database. Candidate chemical and drug names are identified by using an English dictionary as a list of stop words. All candidates are queried in the compound database. We integrated a small number of heuristic rules into this query based approach. DBCHEM attained 58% precision and 71% recall on the development set with a total running time of 14 minutes for 3500 articles.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of BKCa channel openers by molecular field alignment and patent data-driven analysis

In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...

متن کامل

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis.    Methods: The method of this research is log anal...

متن کامل

CHEMDNER system with mixed conditional random fields and multi-scale word clustering

BACKGROUND The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary. METHODS We developed a CHEMDNER system based on mixed conditional r...

متن کامل

Combining Machine Learning with Dictionary Lookup for Chemical Compound and Drug Name Recognition Task

Following the interest taken into Name Entity Recognition in academic literature in the Gene Mention recognition task of BioCreative I and II, the BioCreative IV hopes to make the implementation of the system in the field of detecting mentions of chemical compounds and drugs. Considering that the machine learning methods have obtained great success in the correct identification of gene and prot...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013